GraphZip: Mining Graph Streams using Dictionary-based Compression
نویسندگان
چکیده
A massive amount of data generated today on platforms such as social networks, telecommunication networks, and the internet in general can be represented as graph streams. Activity in a network’s underlying graph generates a sequence of edges in the form of a stream; for example, a social network may generate a graph stream based on the interactions (edges) between dierent users (nodes) over time. While many graph mining algorithms have already been developed for analyzing relatively small graphs, graphs that begin to approach the size of real-world networks stress the limitations of such methods due to their dynamic nature and the substantial number of nodes and connections involved. In this paper we present GraphZip, a scalable method for mining interesting paerns in graph streams. GraphZip is inspired by the Lempel-Ziv (LZ) class of compression algorithms, and uses a novel dictionary-based compression approach to discover maximallycompressing paerns in a graph stream. We experimentally show that GraphZip is able to retrieve complex and insightful paerns from large real-world graphs and articially-generated graphs with ground truth paerns. Additionally, our results demonstrate that GraphZip is both highly ecient and highly eective compared to existing state-of-the-art methods for mining graph streams.
منابع مشابه
GraphZip: Dictionary-based Compression for Mining Graph Streams
A massive amount of data generated today on platforms such as social networks, telecommunication networks, and the internet in general can be represented as graph streams. Activity in a network’s underlying graph generates a sequence of edges in the form of a stream; for example, a social network may generate a graph stream based on the interactions (edges) between dierent users (nodes) over t...
متن کاملA framework for clustering massive graph streams
In this paper, we examine the problem of clustering massive graph streams. Graph clustering poses significant challenges because of the complex structures which may be present in the underlying data. The massive size of the underlying graph makes explicit structural enumeration very difficult. Consequently, most techniques for clustering multidimensional data are difficult to generalize to the ...
متن کاملEecient Optimal Recompression
An eecient variant of an optimal algorithm is presented, which reorganizes data that has been compressed by some on-they compression method, into a more compact form, without changing the decoding procedure. The algorithm accelerates and improves the space requirements of a known technique based on a reduction to a graph-theoretic problem, by reducing the size of the graph, without aaecting the...
متن کاملFrequent Pattern Mining from Dense Graph Streams
As technology advances, streams of data can be produced in many applications such as social networks, sensor networks, bioinformatics, and chemical informatics. These kinds of streaming data share a property in common—namely, they can be modeled in terms of graph-structured data. Here, the data streams generated by graph data sources in these applications are graph streams. To extract implicit,...
متن کاملEf"cient Optimal Recompression
An ef"cient variant of an optimal algorithm is presented, which reorganizes data that has been compressed by some on-the-#y compression method, into a more compact form, without changing the decoding procedure. The algorithm accelerates and improves the space requirements of a known technique based on a reduction to a graph-theoretic problem, by reducing the size of the graph, without affecting...
متن کامل